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Computer graphics processor ami method for generating a computer graphics image 



The invention relates to a computer graphics processor. 

The invention further relates to a method for generating a computer graphics image. 

It is known that 2D filtering can he approximated using two ID filter passes fbr a large class 
5 of Altera, providing more efficient solutions. See fbr example: Edwin CatrauII and AlvyRay 
Smith- 3-d transformations of images in scanline order. In Computer Graphics (SIGGRAPH 
'80 Proceedings), volume 14(3), pages 279 - 285, 1980 and George Wolbeig and Terrance 
E. Boult Separable image warping with spatial lookup tables. In ComputerGraphics (Proc 
Siggrapk '89), volume 23(3), pages 369 - 378, July 1989. 

10 As described therein, such separation can exhibit so-called bottleneck and shear problems, 
resulting in bfcuriness and aliasing respectively, In order to moderate the consequence of 
these disadvantageous effects these publications describe four traversal orders. One of these 
is chosen to reduce the bottleneck problem. As a solution to shear aliasing, supersampling is 
described in the known literature. Supersampling requires an extra downscale filter to reduce 

15 the extra resolution to the required output resolution. 



It is a purpose of the present invention to provide a computer graphics processor and method 
fcr generating a computer graphics image wherein such an extra downscale filter is 

20 superfluous. A computer graphics processor of the invention in accordance with this purpose 
is claimed in claim 1 . A method for generating a computer graphics image of the invention in 
accordance with this purpose is claimed in claim 2. In the computer graphics processor 
according to the invention the model information providing unit provides information 
representing a set of graphics primitives; such as triangles or other polygons, or Beziershapes. 

25 The information provided may comprise geometrical information defining a shape of the 
primitives and/or appearance information defining an appearance of the primitives* such as 
texture and color information. 

The jastetfeer is capable of generating a first sequence of coordinates which coincide with a 
base grid u,v associated with the primitive, and is fljrfher capable of generating one or mora 
30 sequences of interpolated values associated with the first sequence. The further sequence may 
include screen (display) coordinates x,y and fbr example includes normal information for the 
surface represented by the primitive. 

The color generator is arranged for assigning a color to said first sequence of coordinates 
using said appearance information. The color generator may simply use an interpolated color 

35 provided by the rasteriser, but may also perform complicated shading and texturing 

operations. The display space resampler is arranged for resampling the color assigned by the 
color generator in the base grid fbr coordinates u,v to a representation in a grid associated 
with a display with coordinates x,y, The trans&xmation is carried out in a first and a second 
transformation as this substantialy reduces computational effort. 

40 The display space resampler uses a transposing ftoility and a selection facility for selecting 
from four options depending on whether the image data is transposed or not, and depending 
on the order wherein the first and the second transformation are applied, i.e. whether first the 
x or the y coordinate of the display coordinates is calculated. This computer graphics 
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2 15.04.2003 
processor and the method according to the invention include a new selection critcrium to 
choose between the four alternatives, which also takes shear into account It was recognized 
by the inventors that the second pass filter can act as downsample filter for any 
supersampling performed in the first pass, while supersampling in the second pass would 
5 requires an extra downsample filter (a third pass). Therefore, after a first selection step to 
reduce bottleneck, wherein it is determined whether a transposition should be applied or not, 
a choice between the remaining two options is made on the shear occurring in the two passes 
(transfonnations). 

A transposition is intended to include transformations wherein the coordinates in a coordinate 
1 0 pair are interchanged This is for example achieved by mirroring the coordinates around a 
line x=y or a line x—y. The same effect is achieved by a forward or a backward rotation over 
90°. In this case however, the second transposition should involve forward rotation if the 
first tranposition is a backward rotation and vice versa, 

The selected option is the one wherein the least amount of shear occurs in the second pass, so 
15 that supersampling in the second pass is avoided. Instead, the worst shear is put in the first 
pass where shear aliasing can easily be reduced by using supersampling. 
More in particular four options can be defined as follows. 
• a first option where: 

in a first pass, uv image data is resampled into xv image data by mapping u to x for 
20 each line having coordinate v> and 

in a second pass the xv image data is transformed into xy image data by mapping v to 
y for each line having coordinate x, 

• a second option where 

in a first pass, uv image data is resampled into uy image data by mapping v to y for 
25 each line having coordinate u, and 

in a second pass foe uy image data is transformed into xy image data by mapping u to 
x for each line having coordinate y, 

• a third option where 

in a first pass, uv image data is transformed into yv image data by mapping u to y for 
30 each line having coordinate v, 

in a second pass the yv image data is transformed into yx image data by mapping v to 

x for each line having coordinate y, and 

the yx image data is transposed to xy image data, 

• a fourth option where 

35 in a first pass, uv image data is transformed into ux image data by mapping v to x for 

each line having coordinate u, 

in a second pass the ux image data is transformed into yx image data by mapping u to 

y for each lane having coordinate x, and 

the yx image data is transposed to xy image data. 
40 In the third and the fourth option an explicit transposition of the coordinates generated by foe 
rasterLzer is not necessary. Instead the coordinates of the vertices and/or control points 
defining foe primitives can simply be transposed before generating the coordinates of the 
bajsegrid. 

These and other aspects of the invention are described in more detail in the attached article 
45 ^Pijcel shading and forward texture mapping", pp 1-7, by Bart Barenbrug, intended for 
publication in Graphics Hardware (2003), M. Doggett, W. Heidrich, B. Mark, A. Schilling 
(Editors). 

Possible implementations of the model information providing unit, the rasterizer and the 
color generator are described in foe European patent application 03 1 003 13.0 filed 
50 13KBB2003 by the same inventors. 



15. APR. 2003 14237 PHILIPS CIP NL +31 40 2743489 

PHNL03 Q436EPQ 



NO. 144 P. 9/20 
009. 15.04 .2003 15:37i23 



3 15.04.2003 

CLAIMS: 



1. A computer graphics processor comprising a model information providing unit 
for providing information representing a set of graphics primitives, a rasterizer capable of 
generating a first sequence of coordinates which coincide with a base grid associated with the 
primitive, a color generator for assigning a color to said first sequence of coordinates, and a 

5 display space resampler for resampling the color assigned by the color generator in the base 

grid for coordinates ^ - - - 

coordinates x,y, in a first and a second transformation, carried out in a first and a second pass, 
find optionally including a transposition* 

the processor having a selection facility for selecting the order of the transformations and 
10 selecting whether to apply a transposition or not based on an evaluation of the partial 
derivatives 
dx dx dy dy 

du 9 dy> f du 9 dv 9 two of which determine shear and two of which determine scaling in the 
transformations, the selection being made wherein relatively large derivatives occur as scale 
factors, and relatively small derivatives occur as shear factors and wherein the lowest amount 
15 of shear occurs during Hie second transformation. ■ • 

2. Method for generating a computer graphics image, comprising the following 
Steps: 

20 providing information representing a set of graphics primitives, ' 
generating a first sequence of coordinates which coincide with a base grid associated with the \ 
primitive* 

assigning a color to said first sequence of coordinates using information representing the 1 
graphics primitives, 

25 resampling the color assigned by the color generator in the base grid for coordinates u,v to a 

representation in a grid associated with a display with coordinates x,y, in a first and a second J 
pass, 
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cfc dx dv 5v 

evaluating the partial derivatives — ~>^>^of the coordinates in the display with 

du dv 6u dv 

respect to the coordinates in the base grid, two of which determine shear and two of which 
determine scaling in the transformations, the selection being made wherein relatively large 
derivatives occur as scale factors, and relatively small derivatives occur as shear factors and 
Wherein the lowest amount of shear occurs during the second transformation, 



3. Computer graphics 

of the third 
combined. 



to claim 1 or 2, where in the execution 
and the second transformation are 



4- 



Method for generating a computer graphics image, comprising the following 



providing information representing a set of graphics primitives, the 
information comprising at least geometrical information defining a shape of the primitives 
15 and appearance information defining an appearance of the primitives, 

generating a first sequence of coordinates which coincide with a base grid 
associated with the primitive, and generating one or more sequences of interpolated values 
associated with the first sequence, 

assigning a color to said first sequence of coordinates using said appearance 

20 information, 

resampling the color assigned by the color generator in the base grid for 
coordinates u,v to a representation in a grid associated with a display with coordinates x,y, in 
a first and a second pass 

evaluating the partial derivatives ~, ~ 9 ^^of the coordinates in the 

du dv du dv 

25 display with respect to the coordinates in the base grid, 



-selecting one of the following options: 
a first option where; 

in a first pass, uv image data is resampled into xv image data by mapping u to x for each line 
30 having coordinate v, and 

in a second pass the xv image data is transformed into xy image data by 
mapping v to y for each line having coordinate x, 
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5 15.04.2003 

a second option where 

in a first pass, uv image data is resampled into uy image date by mapping v to y for each line 
having coordinates, and 

in a second pass the uy image data is transformed into xy image data by moping u to x for 
5 each line having coordinate y, 
a third option where 

In a first pass, uv image data is transformed into yv image data by mapping u to y for each 
line having coordinate v, 

in a second pass the yv image data is transformed into yx image data by mapping v to X for 
10 each line having coordinate y, and 

the yx image data is transposed to xy image data, 
a fourth option where 

in a first pass, uv image data is transformed into ux image data by mapping v to x for each 
line having coordinate u, 
15 in a second pass the ux image data is transformed into yx image data by mapping u to y for 
each line having coordinate x, and 
the yx image data is transposed to xy image data, 

from the options to transpose Qr not the option being selected where a matrix representing the 
20 result of the first and the second filter has relatively high scale factors and the order for 

appUng the first filter and the second filter being selected, wherein the highest shear occurs 
during the first pass. 
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ABSTRACT: 



A computer graphics processor is described comprising a model information 
providing unit for providing information representing a set of graphics primitives, a xasterizear 
capable of generating a first sequence of coordinates which coincide with abase grid 
associated with the primitive, a color generator for assigning a color to said first sequence of 
5 coordinates, and a display space res ampler for resampling the color assigned by the color 
generator in the base grid for coordinates u,v to a representation in a grid associated with a 
display with coordinates x,y, in a first and a second transformation. The transformation is 
carried out in a first and a second pass, and optionally includes a transposition. The order of 
the passes and the decision to apply a transposition or not is based on an evaluation of the 
dx dx Sy By 

10 partial derivatives § du 9 dv 9 two of which determine shear and two of which 

determine scaling in the transformations, The option is selected wherein relatively large 
derivatives occur as scale factors, and relatively small derivatives occur as shear factors and 
wherein the lowest amount of shear occurs during the second transformation, 
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Figure 3 of pdf article. 
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Pixel shading and forward texture mapping 

Bart Barenbmg 
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Abstract 



^^fefe^=^^»/ ff Wte »» in ™*n, years aUowcdfir rmtomo colour computation providing 



e^ni^r i^„ZZ**L "™"J™™w«"f<*P™» » mcreasequmtty.pr^JiltertngeippTCaches become now 
£^SlfS^ ^ 'ST ^J*"** «a»W ««A Petering J a cot* cOmparableTo 

%to,Zl^ S, ?*u dB iVe ^ S * *** <rf wvsampling. But architectu^pZed 

^,w^ff "*"? ^ «** "texmimap" is mopped m^he^Z^L 

forward utters mapping This combined architecture features both the advantages provided bypr^ZSe 
pixel fading and the high quality antialiasing prwlded by thepr&ltcr tecImUplL progranmeme 



O^^^^tMTD^t^^co^ns^MMCCsy. 13.1 [Computer Graphics]: Graphics praetors 13 3 
[Computer Giaphlw]; Display algoriihms 1.3.7 [Computer Graphics]: Color. Ms, SiDg.wlSr 



It Introduction 

Progm mmnb le vertex and pixel shading has in recent 
years paved the way towards mora programmable graphics 
pipeline architectures pad opened up a host of possibilities 
for improving the fidelity of retime computer generated 
imagery, enabling effects similar to those used in off-line 
rendering for the movie industry. 

Another issue when it comes to high-fidelity pictures 
i9 anti-aliasing (both texture anti-aliasing and edge anti- 
aliasing). Current siroer«?ainp!tng and multi-sampling tech- 
nitmes are computationally intensive and require a lot of 
memory bandwidth. In a graphics pipeline architecture us- 
ing forward texture mapping? profiltering can easily be used 
to avoid aliasing artifacts^ 2 . Such systems so far however 
4o not support pixel shading. 

This paper presents a novel graphics pipeline architecture 
featuring both progiammflble shaders and forward texture 
mapping preflltcnng techniques, providing both high-quality 
anti-aliasing and all the pixel shading effects. The changes 
required to transform a traditional pipeline into this new ar- 



chitecture are presented, along with details on for example 
how to handle mjprnapping in a forward mapping axebirea- 
tore, and on how to solve the bottleneck and shear problems 
that occur when using two-pass resampling. 

The paper is structured as follows: in Section 2» the pre- 
vious work Witt be presented in the form of a walk down 
the traditional pixel sltfiffngprpeliiio and the forward texture 
mapping pipeline. Then, in Section 3, the combined archi- 
tecture will be presented and some of via features discussed. 
Results are then presented m Section 4 followed by some 
conclusions in Section 5. 



2, Previous Work 

Reridermg computer generated images Is done using many 
diftereni technics for many dlflerenc purposes. High* 
volume consumer level products ere based on screen-space 
scanlino algorithms which render primitives one by one by 
traversing the pixels on the screen to Which the primitive 
projects. Inverse texxnre mapping is used to determine the 
colour that a primitive contributes to such apfcel 
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Figure 1: A traditional pixel shading architecture 



With some modification^ in the later stages (rasterisation 
and beyond) of such a graphics pipeline, forward texture 
Tyiap pi'iig can be used as well. With forward texture map- 
ping, die projection of the primitive on the texture map is tra- 
versed, rather than the projection on the screen, and the tex- 
ture samples (texels) which are encountered are mapped and 
splatted onto the screen. The sometimes quite inverse nature 
of programmable pixel shading (used when for example do- 
ing dependent texture look-ups, or performing enviTonment- 
mappad hump mapping) doc? not seem to fit well with this 
technique, Forward texture mapping does however provide 
very high-quality anti-aliasing ai moderate cost. 

Before presenting a combined architecture, first me tradi- 
tional pixel snafiing architecture is discussed to sketch the 
context, as wall as the forward texture mapping architecture 
described in the paper by Meinds 5 . The building blacks from 
these two architectures can then later be combined into an 
architecture featuring both techniques. 

2,1. Traditional pixel shading architectures 

Figure 1 shows the architecture of the last stages of a graph- 
ic? pipeline featuring programmable vertex and pixel shad- 
ing. 

The programmable vertex shade? (together with the frame 
buffer at the end of the pipeline) provides the context 0 f the 
changes to the architecture which we will describe later. The 
vertex shader 5 can modify and Setup data for we pixel shader 
for each vertex of a primitive that i to be rendered. The data 
provided by the vertex shader Co the rasteriser (rot interpola- 
tion) usually includes attributes like diffuse and/ox specular 
colour, texture coordinates, (homogeneous) screen coordi- 
nates, and sometimes extra data tike eurtace normals or other 
dam required for the shading process. 

These attributes are offered to ihe screen-space rasteriser 
which uses a scanline algorithm to traverse the pixels which 
lie within the projection of the primitive on the screen, by 
selecting the semen coordinates from ihe vertex attributes 
as driving variables for the rasterisation process. The ras- 
teriser interpolates the attributes to provide values for each 



of the pixels visited. Interpolation accounts for the perspec- 
tive napping from world space to screen space. 

The attributes are then available at each pixel location, 
where a pixel shader 11 ' ,2 - 13 can use them to compute the lo- 
cal surface colour, When texture data is needed, the texture 
space msampler is used to obtain a texture sample given the 
texture coordinates. These texture coordinates arc generated 
by the pixel shader based on the interpolated coordinates 
received from the rasteriser and any results fiom previous 
texture fetches (so-called dependent texturing) and/or calcu- 
lations. The texture filter operation is usually based on bi- 
linear or irMinear interpolation of nearby texels, or combi- 
nations of such texture probes to approximate an anisotropic 
(pcrspcctively transformed) niter footprint. 

After the surface colour for a pixel has been demrmined, 
the resulting pixel fragment is sent onwards to the Edge 
Ad ti- Aliasing and Hidden Surface Removal (EAA & HSR) 
unit Usually, mis unit uses a Z-builer for hidden surface re- 
moval and multi-sampling (with the associated sub-sample 
buffer and downsample logic) for edge anti-aliasing, Down- 
sampling is applied when an primitives have been rendered 
using a (usually box) prefUter to combine sub-samples to 
colours at the final pixel resolution, 

2.2. A forward texture mapping architecture 

A forward texture mapping architecture, further described in 
the paper by Meinds 8 , is depicted in Figure 2. This architec- 
ture dues not support pixel shading, and the vertex shader is 
a t radition al Transform and Lighting unit. 

A3 in the traditional pipeline, the T&L unit delivers at- 
tributes for the vertices of a primitive. Unlike the traditional 
screen space rasteriser, me texture space rastoriser traverses 
the projection of the primitive onto ihe texture map (rather 
than the screen), by selecting the texture coordinates (instead, 
of screen coordinates) as the driving variables for the raster- 
isation process. All attributes are interpolated (linearly, ex- 
cept for the screen coordinates to which a texel is projected, 
which are interpolated perspeorively). 

The texture space rasteriser traverses the texture map on a 



fcifratatftft G^ktttf&rdwara (2003} 
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mid cmtesjxnidhuj to 4D ajfem^piqg (for mm dctoIls ^ 
mipmaps, sec Hen)ftat*s survey of texture rnappW) 
A^ctoc fitch then amounts to 4t> flzipmsp teccasmmn^ 
from the 3D rmpmap data stored in the texture memory. 
A fetched taxel can be combined with uiiomolaled diffuse 
anoVor specular cojow resojang in a colour sample of the 
surfece with associated (generally non-inK^cr) screen coor- 
ates which indicate where this texture sample h mapped 
to on screen, 

TOe screen space resampler aplats napped tcxels to in- 
teger screen Haitians, prevfdms me imsfic of the primitive 
on the SCrtcn. Hie 2D resampling operations Can efficient^ < 
bo executed in two ID tesample passes using ID FIR filter 
structures, 

The pixel fragments coming &otn tho screen space ra- 
sampler are thou combined in the BAA & HSR unit, which 
uses a Segment buffer 8 . Pixel agents are depth-sarmd 
mm mis buffer to solve tho hidden surface problem. After all 
primitives have been rendered, all visible fragments for each 
pixel are combined (which mostly amounts re simple sum- 
mation since the screen apace resampler delivers colours al- 
ready weighted by me prefilter) and som re the irame buffer; 
Edge aotHKasing results fiom combiiun^ the pardal con- 
tributions generated by die screen space rasterioer near the 
edges, resulting in a final pixel colour which is a combma- 
tion Of colours ton different orimitive?. 

Tfechmquea used in me A-bttffcri, me £3 buffers or the- 
WwpEnsmeW could bo used in the EAA & HSR unit, al- 
though me fragment buffer described in the paper fbom 
Memds> differs from these in that it sees the colours in we 
buffer from one pixel as partial contributions to the whole 
filter footprint of that pixel (possibly partially behind one 
another) instead of colours ibr a certain position on a super- 
sample grid (which Ate always next to each other). 

3. A combined archttectnrc 

Kxel shading often amouuts to combining several textures in 
one way or another. This can easily bo done in a ttadroonal 
architecture since the texture space resampler rcsamples all 
textures to the common screen pixel grid, where the texture 
samples for each pixel can he combined. 

In me forward texture mapping archffecutt^ texture san> 



^^^J^^ m & *** *> a can only 

taverse one^taia time, so whan raatcrisnnon iataK 

^onafexto^asmmeftrw 

ect one texture out of many, and naverso the primitive on 
mo associated gnd. Multiple textures can he bundled inn 
nmlu,pass fashion, so that they can be coxntfued after they 
arer^ampled m the screen pixel mid. But that congests the 
fragment buffer in the EAA & HSk unft. It also precludes 
^cedjeatixressueh as dependent texturing, or texture 
modes which are of an inverse nature, sucnaa^vitonrr^ 
mappodbump mapping (where the bump map inrerrnatfon at 
each mid po sition detennines where the environment man is 
indexed, possibly resulrmg in a one-t^many fcrward map- 
pwg from envirottmcnr map to me primitive surface and the 



lb avoid these problems wean make euro that the screen 
space reasmplor can map texture sarnples as if there was only 
one texture inap associated with the primitive. To this endL 
Shadmgof me surface colour should be performed teforsihe 
M^*P^resanoru^ 
which shows a hybrid ejaphics pipeline, 

3*1* Overview 

In this hybrid pipeline, the rasteriser traverses the tumntSve 
over a "amfieo grid* ie. over a grid in a coordinate sys- 
tem that provides a 2D parametcrisaaonof the sur&coof the 
primitive. The mid associated with a Texture map provides 
such a sur&ce stid, and is preferably used as surface grid 
(since obtaining texture samples on a texture grid does not 
tenure reraamhn^.But m case of absence of texture maps, 
orwlumforexample textures are J Dor 3D, another mid 
bo chosen, This is described in more details in Section 3± 
Attributes can be linearly interpolated over this grid* except 
(as in the case of forward texture mapping) fcrihe peisoec- 
trvaly mapped screen CQOitifoAfc™ n»^r<wi ~jo. 'iTLi* 



position. 



, — .„mfta (mch)<\\n« p ^^n ^ary 

texture coordinates) to me positions on mis grid. The pixel 
shader then operates on the attributes on grid positions in this 
surface srnt and if dwre are aiiy secondary texmres associ- 
atod wtt the primlnVo. it uses inverse mapping with stan- 
dard texture space resamplers to obtain colours from these 



luteultud to Ckyt to ffinlbtnr (300$ 
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Figure 3: Pixel shading and forward mapping archncnurts combined 



textures If &e sur&ce grid was selected from a torture map, 
fetching texel* for this "primaiy" texture stage boils down 
to the 4D rmpmap reconstruction process from iho forward 
mapping pipeline. This process is a form of axis-aligned 
anisotropic filtering, and anisotropic filtering is often avail- 
able in standard texture space resarnplers. The fetched tex- 
ture samples and otter attributes can be used by too pixel 
ahader program to compute the surface colour for The cur- 
rent grid position. 

Once the sample on the surface grid has been shaded, 
the screen space resampler is used to splat the colour to the 
screen pixels, where an BAA & HSR unit can combine con* 
Iribudons from different primitives. 

&2- Details 

Using the hybrid architecture has several consequences and 
enables several nice features. Those will be discussed here 
for the different stages of the hybrid pipeline along with 
some discussion of changes to these stages with respect to 
the traditional pipeline. 

The rAsrcrfser will have to choose a primitive grid (3 .2. 1), 
avoid bottleneck and shear problems (3.2.2), and control 
mipmapping (3.2.3). The rxogrammablo pixel sender and 
texture space resamplers remain the same as in a traditional 
pixel shading pipeline (3.24), and there are some options fbr 
the screen space resampler (3,2.3). 

3.24. Choosing the surface grid 

The surface space rasteriser will contain an extra, first 
pipeline srage (next to the regular setup and traversal stages) 
in which die surface grid is chosen. This is done before the 
regular rasteriser setup, 

Preferably, the surrace grid is derived from one of the tex- 
ture maps, so that this texture map does not have to be rc- 
sampled (apart from 4D remapping) by the texture space 
rastcrisoE. To this end, the grid setup stage can examine the 
texture maps which are as sociated with the primitive. For a 
texture map to be eligible to provide the surface grid, it has 
to full-fin three requirements. First, it must not be addressed 



dependently. Second, it has to be a ZD texture (ID and 3D 
textures are not suitable for traversing a 2D snrfcee). Third, 
the texture coordinates at the vertices should not make up a 
degenerate primitive (where for example all the texture co- 
ordinates hue up, yielding in effect a 1 D texture). 

If more than one textures are eligible, we select the tex- 
ture with the largest area in texture space: this is the texture 
with potentially the most demil and highest frequencies (so 
beat to avoid the texture space resampling process for this 
texture, since jhia process can give rise to urmeeded blur and 
aliasing). 

If there is no eligible texture available (in ease of a simple 
Gouraud shaded primitive for example), a dummy grid over 
the surface of the primitive can be chosen for the rasteriser to 
traverse, by assigning dummy "texture coordinates" to each 
vertex. This dummy grid Is then traversed by the rasteriser as 
if it were a texture grid (except that texture fetching for these 
coordinates is disabled). An advantage Is that die resulting 
linear interpoladon over the surface provides for perspective 
correct Gouraud shading; as noted by Chen 1 . Assigning the 
x and y screen coordinates of each vertex as dummy tex- 
ture coordinates is an option, Note that this choice does not 
mean that the rasteriser traverses screen pixels, since the ho- 
mogeneous for each vertex Is still taken into account when 
mapping a 'lexer to the screen. Another option (for planar 
surfaces such as triangles) is rotating the 3D vertex coordi- 
nates in eye space, such that the normal of the rotated sur- 
face aligns with the eye spaces-axis, and then selecting the 
rotated eye space x and y coordinares as dummy grid coordi- 
nates for each vertex. 

The ability to traverse any grid over the surface of the 
prirnitfve provides a lot of freedom. This freedom could be 
used fbr example to avoid any bottleneck and shear probloms 
which might be associated with the mapping of a particular 
texture grid to the screen (see Section 3.2.2). It could bo used 
to traverse non-planar primitives (such as traversing aBezier 
patch via the 2D grid determined by ita srjrfcce paramerer- 
iaation, and using the forward mapping to directly splat the 
surface colours to the screen). It could also bo used to ras- 
teriBc in the direction of the morion of the primitive so that 
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a ID fitter can be applied along the rasterisation direction to 
efficiently implement motion blur 7 . 

32X Bottleneck and shear 

If the screen space resampler is implemented using 2-pass 
filtering (see Section 3,2,5), the rasttriser has to prepare for 
tins in addition to the setup required for primitive surface 
traversal Two-pasa filtering is known 10 suffer from bottle- 
neck and shear problems* 7 * * a . 




ngore 4; 2fte bottleneck problem 



Hie bottleneck problem is illustrated in Figaro 4 Where 
the area of the intermediate image becomes very small rela- 
tive to tne input and output images. It occurs with rotations 
dose to 90 , and results in excessive blur in the direction of 
toe second pass (since ore second pass has to magnify the 
collapsed intermediary image again). 

tt ,0,l,2 t 3i4,S,6 t 7i x: .0.1.3.3.4.5. fit?. 

Figure 3\ Shear causing vertical aliasing in a horizontal 
pans 

The shear pro hlam is illustrated in Kgura 5 where two 
lines of a texture mop and intermediate image are shown, 
The texture map has a black vertical line, and the shear of the 
perspective mapping is such mat the black pixel on the sec- 
ond lino ends up five pixels more to the right than the black 
pixel on the first line. The horizontal filter pas? prevents hor- 
izontal aliasing, but not vertical aliasing. The shear causes a 
very sharp transition between black pixels on oce line, and 
White pixels on the next Also, the lino In the intermediate 
image consists of disjunct parts, separated by columns to 
which the line dose not contribute (eg. tor* = 3). 

A solution 17 to the shear problem is to rasterise at a finer 
resolution, causing extra intermediate lines to be generated, 
with a black pixel at intermediate positions filling the holes. 
If this is done for first pass shear, the second pass will down- 
scale the larger intermediate image to its final resolution. If 
shear in the second pass would be treated using the same 
super-sanmlittfr approach, a third pass would be needed to 
reduce the generated Higher horizontal resolution to the out- 
mrtrcsohinon 

Much like in the paper by Caimull and Smith*, we avoid 

automat Jo Crof&tp ffentHWa (3063) 



s 

the bottleneck problem by choosing between the flmr op- 
tions obtained by deciding (1) to generate the output image 
straight away, or generate a transposed version and transpose 
the generated image, and (2) doing me horizontal pass first, 
or doing the vertical pass first However we use different cri- 
teria to choose between these options to try to avoid shear 
in the second pas?. In this way, wo will not have to generate 
extra Image columns to counter shear in the second pass, but 
at worst have to only generate intermediate lines to counter 
fihear aliasing in the first pass. This avoids having the penalty 
ofamirdpass. 

Consider a local linearisation of the mapping around some 
point pi 

( y-yp ) ( § 1 ) ( v-v p ) 

The derivatives used there are good indicators for the bot- 
tleneck and shear problems (at point p)\ tor the case where 
in the first pass u is mapped to x and fa the second pass v 
to v> |j is the scale &etor for the first pass: if this is small 
(close to zero), the mtennediaja image will collapse and ex- 
hibit the bottleneck problem, |g ia the scale factor for the 
second pass, and |f and §£ axe the amounts of shear for the 
first and second pass respectively. 

Each of the four derivatives is in the role of first pass scale 
factor for one of the four options for combating the bottle- 
neck problem, Choosing the option which corresponds to the 
largest first pass scale factor (thereby maximising the inter- 
mediate image area 2 ) reduces the bottleneck problem/hut 
may leave shear in the second pass rather man in the first 
To Also take shear into consideration, we select between 'the 
four options in two Stages. 

First wo decided how to pair up the coordinates; map u 
to x and v to y, or map u to y and v to x. We do rhia by 
looking at the derivatives at the primitives vertices and; see 
how well ii and v correlate to x and y respectively. Wo pair 
up the coordinates such that the two derivatives mat correlate 
the strongestbeoome scale factors, and the other two become 
Shear factors. This amounts to the ."transpose or noV* choice 
from Catmull and Smith, 

Second, we choose the order of too passes. Swapping the 
order of the passes puts the first pass scale and shear factors 
in the second pass, and vice versa. We choose the order that 
puts the least amount of shear in the second pass. This choice 
also determines which of jhe two scale factors Will be the 
one for the first pass, and which one will be the seals factor 
for the second pass. This may yield a sub-optimal choice 
as far as the bottleneck problem is concerned, but since the 
first criterion selects strongly correlated coordinates as scale 
factors, both scale factors are usually large enough to avoid 
the bottleneck problem. So this selection process provides a 
good compromise between avoiding bottleneck and second 
pass shear problems. 
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3^3. Wpmapping 

The EBSterisor must sample the surface of the primitive at a 
resolution suitable for the resolution of its projected screen 
image, lb this end, the grid is traversed in a way similar to 
mipmapping in a traditional pipeline When more or less de- 
tail is required, the rasteriser can change mipmap level by 
shifting the delta values used 10 increment the grid coor- 
dinates and the interpolated attributes. In this way, the ras- 
teriser takes larger or smaller steps over the purfece grid. 

The rasteriser maintains the screen coordinates associated 
id each grid position via the perspective mapping, Using 
these* it can determine if a miprnap switch is in order by 
TTwifcittg sure that the difference between the screen coordi- 
nates of subsequent grid positions remains within 3 suitable 
range (between 5 and 1 for example, though a mipmap level 
bias can be employed of course). 

The above-mentioned process dees not directly provide 
the mipmap level to be used for fetching samples from any 
associated texture map, as it only ensures a proper suriace- 
to-screen mipmap level For each texture map. there is also 
a scaling motor associated with the mapping from the tex- 
ture grid to the surface grid. This scaling factor corresponds 
to a iexture-to*5iirface mipmap level which can be added to 
the surface-to-soreen mipmap level to arrive at the required 
overall texrote-to-sorcen mipmap level. 

The tcxture-to-surface mipmap level can directly be ob- 
tained from the delta values used to interpolate the texture 
coordinates over the surface of the primitive, end since this 
Interpolation is linear, the texture-to-sur&ce mipmap level 
is constant per primitive and can be stored by the rasteriser 
setup in a register associated to each texture resampling 
stage for use- as an offset. 

3.2,4, Programmable shading and texture space 
resampling 

The programmable "pixel" shade? and texture space rcsam- 
pler (of which mere may be one or more, tor serial or par- 
allel fetching of texture samples) are exactly the same as 
those in a traditional pipeline. The pixel shader receives a 
set of (interpolated) attributes, including texture and screen 
coordinates, for one location. The texture coordinates, along 
with the shading program* determine where to index the tex- 
ture maps via the texture space resampler The shaker can 
also modify texture coordinates before sending them to the 
texture space re sampler to implement dependent texturing, 
exactly in the same way as in a traditional pipeline. 

The programmable shader passes the shaded colour on to 
the screen space resampler, along with the associated screen 
coordinates. These in general are not integer, but mis is simi- 
lar to how a pixel shader ma traditional pixel shader pipeline 
might receive sub-pixel screen positions when performing 
super-sampling . The shaded colour is the result of compu- 
tations for one location, and does not depend on the grid 



traversed by the rasteriser This means mat existing Shader 
programs will not have to ho modified to run on the proposed 
architecture. 

There are advantages and disadvantages to performing the 
programmable shading in surface space, Next to the high- 
quality anti-aliasing enabled by the forward mapping, a maip 
advantage is a separation of concerns when it comes to the 
perspective mapping. The texture space resampler now does 
not have to deal with the perspective mapping to the screen. 
Most of the time it will be used to perform an affine transfor- 
mation from texture grid to another. Standard birlmear/tri- 
hnear probe based texture space resamplers can better ap- 
proximate the filter footprints required for such an affine 
mapping than the more general shaped footprints required 
for perspective mapping, so the Quality tram mis resampling 
process will be higher. Only the screen space resampler has 
to deal with perspective resampling, and it is only applied 
to shaded samples on the surface grid once. ' 

A disadvantage is that more samples are shaded than In 
a traditional pixel shading pipeline, since we have roughly 
twice as many surface samples as final pixels. This is due 
to mipmapping mainta ining a unification factor between 1 
and 2 in each direction (so we have roughly 1.5x1.5 surface 
samples per pixel). The high-quality antialiasing will how- 
ever make sure that sub-pixel details stin contribute to the 
final image, thereby further improving image quality. 

Another disadvantage can be that secondary textures are 
now resampled twice (once by the texture space resampler 
and again by the screen Space resampler), which might intro- 
duce extra bhirrines 9 - This is why we select the texture map 
with the highest detail as primary texture, to ensure that ti> e 
finest details are only resampled once. The secondary rex- 
tores will have a smaller trustification (or even magnification, 
as is usually the case with for example tight maps) for which 
some extra blur is not very noticeable. The screen space 
resampler also enables the use for high quality shatpneas- 
enhancement filters, which can also help with maintaining a 
sharp image. 

3*2.5. The screen space resampler 

The shaded colour sample resulting from the pixel shading 
proceed is forwarded to the screen space resampler along 
with its screen coordinates. The screen space resampler ro- 
flamplcs there colour samples, (located generally at non- 
integer pixel positions) to the integer pixel positions needed 
for display. 

When rendering larger primitives or using a wide profiteer, 
a two-pass approach using ID fillers is more efficient than 
directly wing a 2D filter kernel. Also, the ID resamplers 
can be constructed such that no it^nennalisation is required 
(O avoid DC ripple 0 . Two-pass filters do require extra setup 
to alleviate for example the bottleneck and shear problems 
discussed in {Section 3-2,2. The alternative is to use 2D rc- 
sample kernels to perform the splatting. 
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Figure 6s ^ jcene tendered with (a) a traditional pipeline wring 2x2 multtsampling, (b) the proposed hybrid pipeline 



The presence of this screen space sp tatting filter also eiv 
ables direct support for point primitives 18 , which widens the 
applicability of Hie pipeline. 

The resulting pixel fragments ore forwarded fa> the BAA 
& HSR mil <vhich is The same as in the forward mapping 



4. Results 

To demonstrate the concept, we took the software prototype 
of The forward mapping pipeline and extended it, bo the ras- 
terfcer can traverse arbitrary grids over the surface of the 
primitive. Since this pipeline was based On an older version 
of Mesa 10 , we did not have a lull pixel shade? available, so 
we decided to only implement multi-rexiuring to show that 
the concept works. 

The degree of programmability of the shader is relatively 
unimportant when showing that ic is easy to combine several 
textures, so all that fa lacking from this prototype is a demon- 
stration of dependent texturing, but since the pixel shader is 
fee same as in a traditional pipeline, exactly the same modifi- 
cations to the texture addresses can be made prior to fetching 
texels, ao the support for dependent texturing is trivial. 

figure 6 shows a comparison of a scene 16 from The game 
Quake HI rendered by a additional pipeline and the proto- 
type of the hybrid pipeline, Picture 6a exhibits aliasing for 
example on the staircase, even though 2x2 multi-sampling 
was enabled. In picture 6b, this aliasing is virtually gone, 
even though the computational and bandwidth costs tor gen- 
erating mis picture are roughly the same as'for picrure 6a. 
The shadows on the floor and lamp-light on the walls attest, 
to the use of Hghrooflpp, which are rendered using the mufci- 
texturing "shader" of the prototype of the hybrid pipeline. 



5, Conclusions 

A traditional inverse mapping pipeline can be transformed 
into the proposed hybrid inverse/forward mapping pipeline* 
This involves generalising the ras tensor xo enable rasier*. 
satton over a surface grid. It also Involves adding a screen 
space resampler. And it involves exchanging the multi- 
sample buffer, ^buffer and down-sample filter in the HSR 
& EAA unit for a fragment buffer and fragment combiner 
logia 

The combined architecture features both the advantages 
afffered by programmable pixel shading and the high qual- 
ity anti-aliasing offered at low cost by the forward mapping 
technique, Existing pixel shading programs can execute on 
me new architecture without modification. 

The proposed architecture also enables new features, such 
as support for point primitives (doe to the addition of the 
screen space resampler which performs the required splat- 
ting), and support for motion blur (due to the ability of the 
lasteriserm traverse the primitive in the direction of motion}. 
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